Building a Corpus of Manually Revised Texts from Discourse Perspective
نویسندگان
چکیده
This paper presents building a corpus of manually revised texts which includes both before and after-revision information. In order to create such a corpus, we propose a procedure for revising a text from a discourse perspective, consisting of dividing a text to discourse units, organising and reordering groups of discourse units and finally modifying referring and connective expressions, each of which imposes limits on freedom of revision. Following the procedure, six revisers who have enough experience in either teaching Japanese or scoring Japanese essays revised 120 Japanese essays written by Japanese native speakers. Comparing the original and revised texts, we found some specific manual revisions frequently occurred between the original and revised texts, e.g. ‘thesis’ statements were frequently placed at the beginning of a text. We also evaluate text coherence using the original and revised texts on the task of pairwise information ordering, identifying a more coherent text. The experimental results using two text coherence models demonstrated that the two models did not outperform the random baseline.
منابع مشابه
Genre Distinctions and Discourse Modes: Text Types Differ in their Situation Type Distributions
In this paper we explore the relationship between the genre of a text and the types of situations introduced by the clauses of the text, working from the perspective of the theory of discourse modes (Smith, 2003). The typology of situation types distinguishes between, for example, events, states, generic statements, and speech acts. We analyze texts of different genres from two English text cor...
متن کاملBuilding A Training Corpus For Word Sense Disambiguation In English-To-Vietnamese Machine Translation
The most difficult task in machine translation is the elimination of ambiguity in human languages. A certain word in English as well as Vietnamese often has different meanings which depend on their syntactical position in the sentence and the actual context. In order to solve this ambiguation, formerly, people used to resort to many hand-coded rules. Nevertheless, manually building these rules ...
متن کاملAn Empirical Investigation of the Relation Between Discourse Structure and Co-Reference
We compare the potential of two classes of linear and hierarchical models of discourse to determine co-reference links and resolve anaphors. The comparison uses a corpus of thirty texts, which were manually annotated for co-reference and discourse structure.
متن کاملConceptual Metaphoric Language Use in Structuring Political Discourse in Iran-West Relations: A CDA Perspective
The present study was carried out with the purpose of examining the role of metaphorical language in the critical discourse analysis (CDA) of political texts based on a modern framework postulated by Kövecses (2015). The corpus of the study consisted of thirty-thousand words chosen as a textual sample to see which source conceptual domains are used and what generic/discursive attributes emerge ...
متن کاملInvestigating Hortatory Force In The EFL Reading Passages
This study investigates some the reading passages in terms of hortatory messages. To this end a methodology based on Critical Discourse Analysis was adopted. The reading texts from ELT textbooks were examined through a model which drew on Fairclough's approach to CDA, specifically (Fairclough, 2003) in which three characteristic features of hortatory texts are introduced. The analysis reveals t...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014